In this notebook we are comparing the use of Alevin-fry and Spaceranger for quantifying spatial transcriptomics libraries. Two spatial transcriptomic libraries were quantified using Alevin-fry and Spaceranger and the results were combined following the Alevin-fry tutorial. We are comparing that to if we were to not integrate the Spaceranger data with Alevin-fry and only use Spaceranger. For simplicity we will only look at one library, SCPCR000372.
library(gridExtra)
Attaching package: ‘gridExtra’
The following object is masked from ‘package:Biobase’:
combine
The following object is masked from ‘package:BiocGenerics’:
combine
# load in benchmarking functions that will be used for copying data and generating sample tables
function_path <- file.path(".." ,"benchmarking-functions", "R")
file.path(function_path, list.files(function_path, pattern = "*.R$")) %>%
purrr::walk(source)
# set up file paths
base_dir <- here::here()
# output folder to store alevin-fry and cellranger quants from S3
data_dir <- file.path(base_dir, "data", "spatial")
s3_out <- file.path(data_dir, "data", "quants")
# results directory
results_dir <- file.path(data_dir, "results")
# sample name
sample_id <- c("SCPCR000372")
# create directories
if(!dir.exists(data_dir)){
dir.create(data_dir, recursive = TRUE)
}
if(!dir.exists(results_dir)){
dir.create(results_dir, recursive = TRUE)
}
mito_file <- file.path(base_dir, "sample-info", "Homo_sapiens.GRCh38.103.mitogenes.txt")
# read in mito genes
mito_genes <- readr::read_tsv(mito_file, col_names = "gene_id")
Rows: 111 Columns: 1
── Column specification ──────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): gene_id
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mito_genes <- mito_genes %>%
dplyr::pull(gene_id) %>%
unique()
Now let’s take a look at comparing the two methods of using Alevin-fry + Spaceranger to only Spaceranger for quantification. To do this, we will read in the Alevin-fry + Spaceranger combined and Spaceranger only SpatialExperiment objects separately and then merge them into one list before grabbing the per cell and per gene quality metrics.
# get path to fry output directory
fry_dir <- file.path(s3_out, "alevin-fry-knee", sample_id)
fry_dir <- paste0(fry_dir, "-spliced_intron_txome_k31-salign-cr-like-em-knee")
# paths to spatial folders
cellranger_folder <- paste0(sample_id, "-cdna-spatial")
spatial_dir <- file.path(s3_out, "cellranger", cellranger_folder, "outs", "spatial")
# read in combined fry and spaceranger spe
fry_spe <- create_fry_spaceranger_spe(fry_dir,
spatial_dir,
sample_id)
Rows: 4992 Columns: 6
── Column specification ──────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
# spaceranger output paths
spaceranger_out <- file.path(s3_out, "cellranger", cellranger_folder, "outs")
# read in spaceranger output directly using read10XVisium
spaceranger_spe <- create_spaceranger_spe(spaceranger_out, sample_id)
Now that we have read in the data and created our two SpatialExperiment objects, we can go ahead and combine them into one list and then calculate the per spot QC metrics using scuttle::addPerCellQCMetrics().
# create one list with both spe's together
all_spe_list <- list(fry_spe, spaceranger_spe)
names(all_spe_list) <- c("alevin-fry", "spaceranger")
# calculate per cell QC and output to a combined data frame with plotting
all_spe_list <- all_spe_list %>%
purrr::map(
~ scuttle::addPerCellQCMetrics(.x,
subsets = list(mito = mito_genes[mito_genes %in% rownames(.x)]))) %>%
purrr::map(scuttle::addPerFeatureQCMetrics)
After adding in the per spot QC metrics to both of the spe’s, we want to extract the colData from each spe and create a data frame that we can use for plotting. We will also need some information about each sample and how it was run, so we will create a sample metadata table, sample_info_df that will then be merged with the colData.
# create sample info dataframe to be joined with per spot dataframe later
sample_info_df <- quant_info_table(data_dir= s3_out,
tools = c("cellranger", "alevin-fry-knee"),
samples = sample_id) %>%
# convert cellranger to spaceranger
dplyr::mutate(tool = ifelse(tool == "cellranger", "spaceranger", tool))
When we convert the colData to a data frame we use the custom function, spatial_coldata_to_df() to do so and apply it to each spe in our list.
# join coldata dataframe with sample info
coldata_df <- purrr::map_df(all_spe_list, spatial_coldata_to_df, .id = "tool") %>%
# remove extra -1 from spaceranger barcodes
dplyr::mutate(spot_id = gsub("-1", "", spot_id)) %>%
dplyr::left_join(sample_info_df,
by = c("tool"))
Now we only want to filter our data frame to contain spots that are shared between both tools and those that are found to be overlapping with the tissue.
# identify shared spots only
spot_counts <- coldata_df %>%
dplyr::count(spot_id, sample)
common_spots <- spot_counts %>%
dplyr::filter(n == 2) %>%
dplyr::pull(spot_id)
coldata_df_common <- coldata_df %>%
dplyr::filter(spot_id %in% common_spots,
# only include spots that overlap with tissue
in_tissue == 1)
We will also need to filter the spe’s directly based on spots that are present in the tissue, so we create a small function to do this and then apply it to both spe’s in the list.
# we will also want to filter the spe's directly
filter_spe <- function(spe){
spe <- spe[, spatialData(spe)$in_tissue == 1]
}
all_spe_filter <- all_spe_list %>%
purrr::map(filter_spe)
When we look at our results, we will also want to visualize them so we will make a custom function to plot the results. Because of the bug that is present in SpatialExperiment::read10XVisium() function, we will make two separate functions, one for spaceranger only spes and one for alevin-fry + spaceranger spes.
# custom function for plotting spe results and coloring by column of colData of choice
plot_fry_spe <- function(spe, sample, column){
# plot spots only
p1 <- ggspavis::plotSpots(spe,
x_coord = "pxl_col_in_fullres",
y_coord = "pxl_row_in_fullres",
annotate = column) +
scale_color_viridis_c()
# plot with tissue underneath
p2 <- ggspavis::plotVisium(spe,
x_coord = "pxl_col_in_fullres",
y_coord = "pxl_row_in_fullres",
fill = column) +
scale_fill_viridis_c()
# arrange plots and add sample name as title
grid.arrange(p1, p2, nrow = 1, top = grid::textGrob(sample))
}
plot_spaceranger_spe <- function(spe, sample, column){
# plot spots only
p1 <- ggspavis::plotSpots(spe,
# switch the order of row and col to flip the image
x_coord = "pxl_row_in_fullres",
y_coord = "pxl_col_in_fullres",
annotate = column) +
scale_color_viridis_c()
# plot with tissue underneath
p2 <- ggspavis::plotVisium(spe,
x_coord = "pxl_row_in_fullres",
y_coord = "pxl_col_in_fullres",
fill = column) +
scale_fill_viridis_c()
# arrange plots and add sample name as title
grid.arrange(p1, p2, nrow = 1, top = grid::textGrob(sample))
}
First we will look at the per cell metrics: mitochondrial reads per cell, total UMI per cell, and total genes detected per cell.
# % mitochondrial reads/ spot
ggplot(coldata_df_common, aes(x = tool, y = subsets_mito_percent, fill = tool)) +
geom_boxplot() +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("Mito Percent") +
xlab("")
plot_fry_spe(all_spe_filter$`alevin-fry`, "alevin-fry", "subsets_mito_percent")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the
existing scale.
plot_spaceranger_spe(all_spe_filter$spaceranger, "spaceranger", "subsets_mito_percent")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the
existing scale.
Overall it looks like mitochondrial content is low and fairly similar across both tools.
# total UMI/ spot
ggplot(coldata_df_common, aes(x = sum, color = tool)) +
geom_density() +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("UMI/spot") +
xlab("")
plot_fry_spe(all_spe_filter$`alevin-fry`, "alevin-fry", "sum")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the
existing scale.
plot_spaceranger_spe(all_spe_filter$spaceranger, "spaceranger", "sum")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the
existing scale.
# total genes/ spot
ggplot(coldata_df_common, aes(x = detected, color = tool)) +
geom_density() +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("Genes detected/spot") +
xlab("")
plot_fry_spe(all_spe_filter$`alevin-fry`, "alevin-fry", "detected")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the
existing scale.
plot_spaceranger_spe(all_spe_filter$spaceranger, "spaceranger", "detected")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the
existing scale.
Generally it looks like both tools are fairly similar in the number of UMI/cell and genes detected/cell.
Let’s also look at the correlation of mean gene expression across shared genes. We will first need to calculate the per feature QC on the filtered spes after removing spots not present in the tissue and then grab the rowData and combine into a data frame used for plotting.
all_spe_filter <- all_spe_filter %>%
purrr::map(scuttle::addPerFeatureQCMetrics)
# grab rowdata and combine with sample info
rowdata_df <- purrr::map_df(all_spe_filter, spatial_rowdata_to_df, .id = "tool") %>%
dplyr::left_join(sample_info_df,
by = c("tool"))
We then want to filter out any lowly detected genes, (detected < 5.0) and restrict our analysis to those genes that are found in both tools.
gene_counts <- rowdata_df %>%
# remove genes that have a low frequency of being detected
dplyr::filter(detected >= 5.0) %>%
dplyr::count(gene_id, sample)
# restrict to only common genes
common_genes <- gene_counts %>%
dplyr::filter(n == 2) %>%
dplyr::pull(gene_id)
rowdata_df_common <- rowdata_df %>%
dplyr::filter(
(gene_id %in% common_genes)
)
# create a table to calculate correlation between mean gene expression
rowdata_cor <- rowdata_df_common %>%
dplyr::select(tool, gene_id, sample, mean) %>%
# spread the mean expression stats to one column per caller
tidyr::pivot_wider(id_cols = c(gene_id, sample),
names_from = c("tool"),
values_from = mean) %>%
# drop rows with NA values to ease correlation calculation
tidyr::drop_na()
# look at correlation between the two tools
rowdata_cor %>%
dplyr::group_by(sample) %>%
dplyr::summarize(
alevin_fry_knee_spaceranger_cor = cor(`spaceranger`, `alevin-fry`, method = "spearman")
)
# mean gene expression across shared genes
ggplot(rowdata_cor, aes(x = `spaceranger`, y = `alevin-fry`)) +
geom_point(size = 0.5, alpha = 0.1) +
scale_x_log10() +
scale_y_log10() +
labs(x = "Spaceranger mean gene expression", y = "Alevin Fry Mean gene expression") +
theme_classic()
Correlation appears to be quite high between mean gene expression in Spaceranger and Alevin-fry, however, we do see that there is a subset of genes that have higher gene expression in Spaceranger than in Alevin-fry and are slightly off the diagonal.
sessioninfo::session_info()
─ Session info 🌓 💇 💃🏿 ───────────────────────────────────────────────────────────────────────
hash: first quarter moon, person getting haircut, woman dancing: dark skin tone
setting value
version R version 4.1.1 (2021-08-10)
os macOS Catalina 10.15.7
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Chicago
date 2021-11-22
rstudio 1.4.1106 Tiger Daylily (desktop)
pandoc 2.11.4 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown)
─ Packages ─────────────────────────────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
beachmat 2.10.0 2021-10-26 [1] Bioconductor
Biobase * 2.54.0 2021-10-26 [1] Bioconductor
BiocGenerics * 0.40.0 2021-10-26 [1] Bioconductor
BiocManager 1.30.16 2021-06-15 [1] CRAN (R 4.1.0)
V BiocParallel 1.28.0 2021-11-18 [1] Bioconductor (on disk 1.28.1)
bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.0)
cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1)
colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.0)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
DelayedArray 0.20.0 2021-10-26 [1] Bioconductor
DelayedMatrixStats 1.16.0 2021-10-26 [1] Bioconductor
digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.0)
dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
dqrng 0.3.0 2021-05-01 [1] CRAN (R 4.1.0)
DropletUtils 1.14.1 2021-11-08 [1] Bioconductor
edgeR 3.36.0 2021-10-26 [1] Bioconductor
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.0)
GenomeInfoDb * 1.30.0 2021-10-26 [1] Bioconductor
GenomeInfoDbData 1.2.7 2021-11-16 [1] Bioconductor
V GenomicRanges * 1.46.0 2021-11-18 [1] Bioconductor (on disk 1.46.1)
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
ggside 0.1.3 2021-10-24 [1] CRAN (R 4.1.0)
ggspavis 1.0.0 2021-10-26 [1] Bioconductor
ggupset * 0.3.0 2020-05-05 [1] CRAN (R 4.1.0)
glue 1.5.0 2021-11-07 [1] CRAN (R 4.1.0)
gridExtra * 2.3 2017-09-09 [1] CRAN (R 4.1.0)
grr 0.9.5 2016-08-26 [1] CRAN (R 4.1.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
HDF5Array 1.22.1 2021-11-14 [1] Bioconductor
here 1.0.1 2020-12-13 [1] CRAN (R 4.1.0)
hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.0)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
IRanges * 2.28.0 2021-10-26 [1] Bioconductor
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0)
knitr 1.36 2021-09-29 [1] CRAN (R 4.1.0)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.0)
lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.0)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
limma 3.50.0 2021-10-26 [1] Bioconductor
locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.1.0)
magick 2.7.3 2021-08-18 [1] CRAN (R 4.1.0)
magrittr * 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.1)
Matrix.utils 0.9.8 2020-02-26 [1] CRAN (R 4.1.0)
MatrixGenerics * 1.6.0 2021-10-26 [1] Bioconductor
matrixStats * 0.61.0 2021-09-17 [1] CRAN (R 4.1.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
RCurl 1.98-1.5 2021-09-17 [1] CRAN (R 4.1.0)
readr 2.1.0 2021-11-11 [1] CRAN (R 4.1.0)
rhdf5 2.38.0 2021-10-26 [1] Bioconductor
rhdf5filters 1.6.0 2021-10-26 [1] Bioconductor
Rhdf5lib 1.16.0 2021-10-26 [1] Bioconductor
rjson 0.2.20 2018-06-08 [1] CRAN (R 4.1.0)
rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1)
rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
S4Vectors * 0.32.2 2021-11-07 [1] Bioconductor
sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
scpcaTools 0.1.2 2021-10-12 [1] Github (AlexsLemonade/scpcaTools@2cdad4c)
scuttle 1.4.0 2021-10-26 [1] Bioconductor
sessioninfo 1.2.1 2021-11-02 [1] CRAN (R 4.1.0)
SingleCellExperiment * 1.16.0 2021-10-26 [1] Bioconductor
sparseMatrixStats 1.6.0 2021-10-26 [1] Bioconductor
SpatialExperiment * 1.4.0 2021-10-26 [1] Bioconductor
stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.1)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
SummarizedExperiment * 1.24.0 2021-10-26 [1] Bioconductor
tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0)
tidyr 1.1.4 2021-09-27 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
tinytex 0.35 2021-11-04 [1] CRAN (R 4.1.0)
tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.1)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.1.0)
vroom 1.5.6 2021-11-10 [1] CRAN (R 4.1.0)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
xfun 0.28 2021-11-04 [1] CRAN (R 4.1.0)
XVector 0.34.0 2021-10-26 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
zlibbioc 1.40.0 2021-10-26 [1] Bioconductor
[1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
V ── Loaded and on-disk version mismatch.
────────────────────────────────────────────────────────────────────────────────────────────────────